Heavy-tailed random error in quantum Monte Carlo 
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The combination of continuum Many-Body Quantum physics and Monte Carlo methods provide 
a powerful and well established approach to first principles calculations for large systems. Replacing 
the exact solution of the problem with a statistical estimate requires a measure of the random error 
in the estimate for it to be useful. Such a measure of confidence is usually provided by assuming the 
Central Limit Theorem to hold true. In what follows it is demonstrated that, for the most popular 
implementation of the Variational Monte Carlo method, the Central Limit Theorem has limited 
validity, or is invalid and must be replaced by a Generalised Central Limit Theorem. Estimates of 
the total energy and the variance of the local energy are examined in detail, and shown to exhibit 
uncontrolled statistical errors through an explicit derivation of the distribution of the random error. 
Several examples are given of estimated quantities for which the Central Limit Theorem is not valid. 
The approach used is generally applicable to characterising the random error of estimates, and to 
Quantum Monte Carlo methods beyond Variational Monte Carlo. 

PACS numbers; 02.70.Ss, 02.70.Tt, 31.25.-v 



Quantum Monte Carlo (QMC) provides a means of in- 
tegrating over the full 3A^-dimensional coordinate space 
of a many-body quantum system in a computationally 
tractable manner while introducing a random error in 
the result of the integration^. The character of this 
random error is of primary importance to the applica- 
bility of QMC, and in what follows an understanding of 
the underlying statistics is sought for the special case of 
Variational Monte Carlo (VMC). 

Within QMC, estimated expectation values have a ran- 
dom distribution of possible values, hence it is necessary 
to know the properties of this distribution in order to 
be satisfied that the statistical error is sufficiently well 
controlled. Many strategies (notably those involved in 
wavefunction optimisation and total energy estimation) 
sample quantities that exhibit singularities, and sample 
the singularities rarely. This is characteristic of a Monte 
Carlo (MC) strategy that is unstable and prone to ab- 
normal statistical error due to outliers Q. 

In what follows the VMC method is analysed in order 
to obtain the statistical properties of the random error. 
Analytic results are obtained, and compared with the re- 
sults of numerical calculations for an isolated all-electron 
carbon atom. The analysis naturally divides into four 
sections. Section |T] provides a summary of the implemen- 
tation of MC used within VMC. The construction of es- 
timated expectation value of an operator/trial wavefunc- 
tion combination is described for the 'standard sampling' 
case (the most commonly used form[l[) as a special case 
of a more general formulation. This short section pro- 
vides no new results, but introduces the notation used 
throughout, and presents well established results from a 
perspective appropriate to the following sections. 
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Section [IT] provides a transformation of the 3A^- 
dimensional statistical problem to an equivalent 1- 
dimensional problem. The purpose of this section is to 
provide a simple mathematical picture of the statistical 
process that is entirely equivalent to the original 3A^- 
dimensional random sampling process. This is achieved 
by removing the statistical freedom in the system that is 
redundant for a given estimate. The principal result of 
this section is the derivation of a general statistical prop- 
erty that arises for almost all of the trial wavefunctions 
available for VMC calculations, and that may not eas- 
ily be prevented. This statistical property dominates the 
behaviour of errors in VMC estimates, and the demon- 
stration of its presence provides the starting point for the 
derivation of the statistics of estimators. 

In section IIIII the 'standard sampling' formulation of 
VMC is analysed. The goal is to find the distribution 
of the random error in statistical estimates of the total 
energy and the 'variance', for a finite but large number 
of samples. The principal conclusion of this section is 
that the Central Limit Theorem (CLT) is not necessarily 
valid and, when it is valid, finite sampling effects may be 
important even for a large sample size. This is demon- 
strated analytically, in the form of new expressions for the 
distribution of errors occurring for 'standard sampling' 
estimates of the total energy and variance. Numerical 
results for an isolated carbon atom provide an example 
of this effect for a calculation employing an accurate trial 
wavefunction. 

In section IIVI estimates of several other quantities rel- 
evant to QMC are considered, and the invalidity of the 
CLT for these estimates is described (when derived using 
the same method as section UlI)) . This section directly 
relates to the infinite variance estimators that have pre- 
viously been discussed in the literature @. 

Finally, we note that this is the first of two closely re- 
lated papers. It provides a general approach to rigorously 
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deriving the statistics of the random error that is an in- 
herent part of QMC methods, and uses this approach to 
obtain the statistical hmitations of the simplest available 
sampling strategy. The following paper [3| employs this 
new analysis of the statistics of QMC in order to design 
sampling strategies that are superior, in the sense that 
the Normal distribution of random errors can be rein- 
stated for a given QMC estimate. 



'STANDARD SAMPLING' VARIATIONAL 
MONTE CARLO 



The basic equation by which MC methods provide a 
statistical estimate for an integral may be written as 
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where P is the probability density function (PDF) 
of the independent identically distributed (IID) 3iV- 
dimensional random vector R„, and W,. is the random 
error in the estimate. 

Introducing some notation used throughout the paper, 
the statistical estimate of a quantity / constructed using 
r samples is denoted Ar[f], hence Eq. ([T]) can be written 
as 
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where the LHS is the statistical estimate of the integral 
(the sample mean in Eq. ([T])), and the RHS can be inter- 
preted as a sum of an expectation of a quantity x — f / P 
sampled over the distribution with PDF P, and a ran- 
dom error. Whether the estimate is useful depends on 
the PDF of Wr, specifically how this distribution evolves 
as r increases. 

An expectation value of the quantum mechanical op- 
erator g and (unnormalised) wavefunction, -tp, is defined 
by 



E[GL^jyp-p] 

E[^VP;P] 



(3) 



where Gl — '4^~^g4' is the 'local value' of the opera- 
tor/trial wavefunction combination. By definition, VMC 
provides a MC estimate for this quantity, and since it 
is a quotient of two expectations it is more complex to 
estimate than a single integral. 

'Standard sampling' is the most common and straight- 
forward choice, for which samples are distributed as 
P(R) — Xijj'^, resulting in the simple form 



Ar[G] = E[Gi;AV/] +Y, 

= i^GL(R„), P(R) 



(4) 



where A need not be known since it is not required to 
generate samples distributed as P(R)[1|. This simple 
form arises from choosing P such that the normalisation 
integral of Eq. ^ is sampled perfectly. 

Within 'standard sampling' it is usually assumed that 
the CLT is valid, and that r is large enough for the 
asymptotic limit to be reached to a required accuracy. 
If this is so, then is distributed normally with a mean 
of 0, a variance given in terms of the sample variance 
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Var [A^ [G]] = - [Var [Gl(R,: 



(5) 



and a confidence range for an estimated value can be 
obtained via the error function. 

Two issues concerning the nature of the random error 
naturally suggest themselves. The use of the CLT to 
provide a confidence interval for the estimate implicitly 
assumes that the large r limit has been reached. Whether 
this is the case for finite r is a non-trivial questionfs'l . The 
second issue is the validity of the CLT. Since this theorem 
is applicable to a limited class of distributions that may 
or may not include the distribution of samples within 
VMC (or other QMC methods) this is also a non-trivial 
question. 

It is useful at this point to introduce some further def- 
initions and notation. An estimate is a random variable, 
and random variables are denoted by a sans-serif font 
throughout. A particular sample value of an estimate 
is referred to as a sample estimate, and estimates are 
usually constructed from sums of random variables. The 
PDF of the estimate constructed from r random variables 
is denoted Pr{x), and defined by 



Prob [a < Ar [G] < b] = f Pr{x)dx, 

J a 



(6) 



and an estimate is unbiased if it has a mean for a given 
r that is equal to its true value. For the estimate to 
be useful the PDF of the error, Y^, must possess cer- 
tain properties. It would be desirable for this PDF to 
approach a Dirac delta function for increasing r, and for 
some information to be available on the form of the PDF 
for finite r. In addition an estimate-able confidence range 
for finite r is desirable, and zero mean value for Y^ for 
finite r. 



II. GENERAL ASYMPTOTIC FORM FOR THE 
DISTRIBUTION OF LOCAL ENERGIES 

For the standard implementation of VMC summarised 
in the previous section, the basic random variable is the 
3A^-dimensional position vector of all the particles within 
the system, R. This is a 'fundamental' random variable in 
the sense that QMC is normally implemented as a ran- 
dom walk in the multidimensional space, R. However, 
this random variable contains far more information than 
is required for many purposes. An analysis is given here 
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for the expectation value of quantities that may be ex- 
pressed in terms of the local energy, Ej^ = tp~^Hip. Note 
that this is a general procedure, and is applicable to es- 
timates of any operator by defining a local field variable 
(scaler, vector or higher order) to remove the redundant 
statistical freedom present in the full 3A^-dimensional 
space, providing a more concise representation. 

The expectation of a function of the local energy E^ 
is defined as 



E[/;AV'2] (7) 
P^2(R)/(£;i)dR, (8) 



and the 'standard sampling' MC estimate of this is con- 
structed by sampling the 3Af-dimensional coordinate vec- 
tor over the 'seed' PDF P^2(R) = A-^^ 

Integrating over a hyper-surface of constant local en- 
ergy removes redundant statistical degrees of freedom 
leaving the field variable, El(TI), as the random vari- 
able. The expectation is then given by 



P^2{E)f{E)dE, 



with the 'seed' PDF of the local energy given by 
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where 9 is a surface of constant El, and VrE'l is the gra- 
dient of the local energy in 3A^-dimensional space. The 
interpretation of this surface integral is straightforward, 
provided that disconnected surfaces and non-smoothness 
in the hyper-surface are dealt with as a sum of sepa- 
rate (and sometimes connected) surface integrals. Equa- 
tion (jlOp simplifies the interpretation of general statis- 
tical properties considerably. Analytic properties of the 
seed distribution may be derived that are general to the 
(iJ, ip) combinations used for VMC. 

In what follows we limit ourselves to the case of elec- 
trons in the potential of fixed atomic nuclei and Coulomb 
interactions, giving a local energy in 3iV-dimensional 
space of the form 
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where V"ee(R) is the sum of two-body potentials (the 
electron-electron Coulomb interaction), and Ve2;t(R) is 
the sum of one body potentials (the electron-nucleus 
Coulomb interaction) . is the local kinetic energy, and 
all other terms are contained in Vl, the local potential 
energy. Singularities will occur for a general "0, and the 
expression above naturally suggests classifying these into 
4 different types. Each has a characteristic influence on 
the asymptotic behaviour of P^2 (E) , and an analysis of 
this relationship is given below. 



A. Type 1: electron- nucleus coalescence 

Type 1 singularities are those resulting from any elec- 
tron coordinate approaching a singularity in the one 
body external potential Vext-, such as the —Z/r behaviour 
of an atomic nucleus. This occurs on a 3iV — 3 dimen- 
sional hyper-surface. 

For a particular electron of coordinate ri approach- 
ing a nucleus, the trial wavefunction can be expanded in 
spherical coordinates to give 



ai(ri, R3Ar_3)ri -I- 



(12) 



where ri = (ri, fl) and R3Ar_3 is the 3A^ — 3 dimensional 
vector of the rest of the coordinate space. If ijj does 
not possess singularities, it must be possible to expand 
a„(r2) as a closed sum of spherical harmonics Yi„i{Q) with 
I < n. Similarly, for ip to be continuous up to order n, the 
coefficient a„(0) must contain only odd/even I spherical 
harmonics in its expansion for odd/even n. 

For a trial wavefunction that is smooth at ri = this 
results in a local energy of the form 
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The absence of a r j~ term is a direct consequence of ip be- 
ing continuous at ri = 0, and the rj~^ term is entirely due 
to the presence of the nucleus potential and the deriva- 
tive of ip being continuous at ri = 0. Figure [T](a) shows 
a 2D cut through the 3D space of ri, with R3JV-3 held 
constant and the singularity due to the nucleus at the 
centre of the (asymptotically) spherical constant energy 
surface. 

Rearranging and repeated re-substitution provides the 
integrand in Eq. (jlOp . and integrating over the constant 
energy surface defined by the limit E^ — > ±00 (a 'hyper- 
tube' which is spherical in the space of ri, but has no 
simple form in the 3A^ — 3 dimensions of R37v_3) gives 
the general form for the tail [31J 



P^2{E) = 



E, 



'0) " (eo 



ei 

{E-Eo) 



E:»Eo 
E<^Eo ' 
(14) 

where E Eq {E Eq) denotes an asymptotic expan- 
sion that converges for E greater (less) than some finite 
value. The asymptotic behaviour is one sided since the 
singularity is negative, and the nodal surface does not 
need to be considered. 

If the usual electron-nucleus Kato cusp conditionj^, 0] 
is forced on ^ it introduces a discontinuity in the gradi- 
ent at ri = that exactly cancels the singular nucleus 
potential in the local energy via the local kinetic energy, 
hence this type of singularity can generally be removed. 
The cusp condition also introduces an dependence in 
the bo term of the expansion, and hence a discontinuity in 
the local energy at the nucleus (although it is of zero size 
for some wavef unctions) [s^ . For N electrons approach- 
ing the nucleus concurrently the same cusp conditions 
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FIG. 1: Constant energy surfaces in the large E limit. Figure (a) shows the surface in terms of the electron-nucleus vector as 
coalescence is approached. The same geometry arises for electron-electron coalescence where the electrons possess different spin. 
Figure (b) shows the surface in terms of the electron-electron vector for electrons of like spin for the case where no singularity 
is present in the local kinetic energy. Figure (c) shows the constant energy surface for singularities at the nodal surface due to 
the local kinetic energy, Tl- 



are sufficient to prevent a singularity, as discussed in the 
next section. 



B. Type 2: electron-electron coalescence 

Type 2 singularities may occur for approacliing 
(* j)i ^'^d result from a singularity in the two- 
body electron-electron interaction, Vee- The coalescence 
of electrons of like spin (indistinguishable) and unlike 
spin (distinguishable) must be considered separately. By 
transforming to centre of mass coordinates for the two 
electrons with positions vectors ri,r2 defined as ri2 = 
ri — r2 and Si2 = (ri -|- r2)/2 the same approach can 
be taken as for the electron-nucleus coalescence surfaces. 
To simplify the notation the vector Si2 is included with 
the coordinates of the rest of the electrons in the vector 

R-3Af-3- 

For distinguishable (unlike spin) electrons the situation 
in entirely analogous to the electron-nucleus case. The 
electron-nucleus vector and interaction is replaced by the 
electron-electron vector and interaction to give 

P,.(i?) = |(^-^°)"'(^°+(^ + ---) , 
[ E<^Eo 

(15) 

where Eq and the coefficients, e„, are distinct from those 
in Eq. (In order to keep the notation simple the 

same symbols are used for distinct coefficients in all of 
the series expansions contained within this section.) The 
asymptote is one sided due to the repulsive electron- 
electron interaction, and the nodal surface does not in- 
fluence the result. Enforcing the Kato cusp condition for 
unlike spins removes these tails and introduces a discon- 
tinuity in the local energy in precisely the same manner 
as for the electron-nucleus coalescence. 

For indistinguishable (like spin) electrons the situation 
is more complex. Figure [IJb) shows a 2D cut through 
the 3D space of ri2, with R3JV-3 held constant and a 
constant energy surface that is (asymptotically) spheri- 



cal in the electron-electron coordinate. The singularity 
due to electron-electron coalescence is at the centre of 
the sphere. Unlike the distinguishable electron case the 
coalescence point must fall on the nodal surface, and the 
influence this has on ip must be taken into account. 

Expanding a smooth antisymmetric trial wavefunction 
about the coalescence point (on the nodal surface) gives 

ipiru) = ai{n, R3N^3)ri2+a3{n, R.3N^3)rl^+. . . , (16) 

where interchange of electrons corresponds to inversion 
about ri2 = so the coefficient a„ contains only odd I 
spherical harmonics and I < n. This provides a quadratic 
lowest order variation in the probability density perpen- 
dicular to the nodal surface, which results in a local en- 
ergy of the form 

El{R)-Eo = — +6i(17,R3jv~3)ri2 + ..., (17) 

and an (asymptotically) spherical constant local energy 
surface centred at the coalescence point. Note that the 
absence of a term is a direct consequence of the gra- 
dient of tp being continuous at ri2 = 0. The term 
is entirely due to the Coulomb potential, together with 
■0 being odd on interchange of electrons and possessing a 
continuous second derivative at ri2 — 0. Performing the 
'hyper-tube' integration then gives 

p I (i^-i^o)-^(eo+(^ + ...) E»Eo 

[0 E<t:Eo 

(18) 

where, since the singularity is positive, the asymptotic 
behaviour is one sided. 

Enforcing the Kato cusp conditionfil, [tI] for like spins 
introduces a second order radial term, with coefficient 
02 = ai/4. This provides a discontinuity in the 2"'^ 
order derivatives of ip at ri2 = that cancels the sin- 
gular electron-electron interaction, and so removes the 
tails due to the like spin electron-electron coalescence. A 
further consequence is a continuous local energy as the 
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coalescence plane is crossed, with a discontinuity in the 
gradient of the local energy. 

So far only electron-nucleus and electron-electron co- 
alescence has been considered. For the general case of 
many electron coalescence (some distinguishable, some 
not) at a nucleus site, or at any point in space, and a 
smooth trial function ?/;, the local energy may be written 
in the form 

ELm-E. = Y.v.+Y.h + --- (19) 

provided that the local kinetic energy is smooth. As dis- 
cussed by Pack^], provided the trial wavefunction satis- 
fies the cusp conditions for each electron-electron and 
electron-nucleus coalescence, then the Coulomb singu- 
larities will exactly cancel with singularities in the lo- 
cal kinetic energy. These conditions are easily satisfied 
for trial wavefunctions that are a function of electron- 
nucleus, electron-electron and electron-electron- nucleus 
coordinates, but for higher order correlations internal co- 
ordinates must be considered explicitly. 

Although the Kato cusp conditions remove the 
Coulomb singularities from the local energy, they do not 
prevent the occurrence of discontinuities on the same 
hyper-surface of electron-nucleus and electron-electron 
coalescence. Further cusp conditions that remove these 
discontinuities may be obtained directly from the local 
energy expansions given above. 



C. Type 3: nodal surface 

The third type of singularity (and associated tails in 
the seed distribution) occurs for almost all of the trial 
wavefunctions used in QMC calculations, with the ex- 
ception of some few electron systems. 

Type 3 singularities are due to the kinetic energy only, 
and occur at the nodal surface due to the presence of 
tjj in the denominator of the expression for the local ki- 
netic energy. There is no equivalent to the previous cusp 
conditions that can easily be enforced on ^ to prevent 
these type 3 singularities occurring, and they are of a 
fundamentally different nature. 

Proceeding in a similar manner to the previous two 
cases, the trial wavefunction is expanded about the sin- 
gular surface, in this case the 3A^ — 1 dimensional nodal 
surface. This expansion is then used to provide a con- 
stant local energy hyper-surface, over which an integral 
is performed to obtain the PDF in energy space. 

Figure [ijc) shows a 2D cut through the 3N dimen- 
sional space that includes the nodal surface, and a con- 
stant local energy surface at a perpendicular distance S± 
from the nodal surface. Expressing the vector of a point 
on the constant energy surface as 

R = X + 5_Ln, (20) 



where X is a point on the nodal surface, and n(X) = 
VrE'l |x is the normalised gradient at X, gives 

V^(R) =ai(X)5i+a2(X)5i + ... (21) 

and 

El{-R)-Eo = &_i(X)5X^+6o(X) + &i(X)5i + . . . . (22) 

Employing these in Eq. (jlO|) and integrating over the con- 
stant energy surface defined by the limit El — > ±oo (the 
nodal surface) gives the general form 

P^2{E)^ (E^Eo)-^(eo + jE^ + ..) \E\:»Eo . 

(23) 

Equation (I23|) tells us that for a general trial wavefunc- 
tion and Hamiltonian the resulting 'seed' probability dis- 
tribution in energy space has this asymptotic form for 
type 3 singularities. This result is central to the rest of 
this paper. 

A special case of this type of singularity arises for a 
trial wavefunction where a nodal pocket is at the criti- 
cal point of appearing/disappearing, which may occur in 
the process of varying a parameterised trial wavefunction 
in the search for an optimum form. This occurs where 
a solution of the equation 'tjj = disappears, or for a 
local maximum/minimum of ip crossing the nodal sur- 
face. At this critical point ip — defines a single point 
in 3A^ dimensional space, and the wavefunction may be 
expanded about this point using hyper-spherical coordi- 
nates R = {R, ft) (with R the hyper-radius and ft the 
3N — 1 hyper-angles) as 

ij{R) = a2{n)R^ + a3{n)R^ + .... (24) 

The associated local energy then takes the form 

El{R, n)-Eo^ h-2{ft)R-'^ + h-i{rt)R-^ + 5o(f^) + . . . 

with the singular behaviour arising via the local kinetic 
energy. Following the same approach as for type 1 and 
type 2 singularities, but integrating over the surface of 
the hyper-sphere gives 

''^^ ^""^ ^ ^T^^P^ W^) +■■■)' 

(26) 

an asymptotic tail in the PDF that is one sided since the 
constant energy surface exists only in the nodal pocket 
that is not being created/annihilated. This gives a faster 
decay than E~^ ior N > 1, and nodal pockets can only 
occur in the ground state for N > 2. Consequently, this 
effect is secondary to the E^^ behaviour arising from 
nodal surfaces that are not being created/annihilated, 
and will only dominate if annihilation of the nodal pocket 
results in no nodal surfaces anywhere in space. This can 
only occur if all fermions in the system are distinguish- 
able. 
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D. Type 4: arbitrary bound trial wavefunctions 

Singularities in the local energy may also occur if the 
local energy approaches infinity as any or all electrons ap- 
proach an infinite distance from the nuclei or each other. 
This type of singularity is referred to as type 4, and its 
source may be the local kinetic energy, the local potential 
energy, or both, and can only occur for systems that do 
not extend over all space. 

For these finite systems a reasonable assumption about 
the general form of a trial wavefunction used in QMC is 
that it is a bound state of some 'model' Hamiltonian (this 
encompasses the exact, HF, MCSCF, Kohn-Sham, Gaus- 
sian basis wavefunctions, and many others, with or with- 
out a Jastrow factor or backfiow transformation). Hence, 
for the types of wavefunction that are used in QMC cal- 
culations, the asymptotic behaviour can be written as 

V'(R) cx |Rr e-'^l^l", (27) 

where the parameters a, (3, and 7 depend on the type of 
trial wavefunction. 

Following the same approach as for type 1 and 2 singu- 
larities, the influence on the asymptotic tails of the seed 
distribution can be determined by integrating over the 
constant local energy surface. This tells us that for 7 > 1 
(e.g., a Gaussian basis set) P^2 decays as an exponential 
function of a power of E, whereas for < 7 < 1 (7 = 1 
is the correct asymptotic form) P^2 is zero outside of 
an energy interval (assuming that none of the other 3 
types of singularity are present). The second case is 
preferable, but the former is not significant as it can only 
result in the presence of exponentially decaying tails 
in . In what follows type 4 singularities are irrelevant. 

Type 3 tails occur for almost all many body trial wave- 
functions, with some exceptions. First, it is possible for 
there to be no nodal surface. This does not occur for sys- 
tems containing two or more indistinguishable fcrmions, 
and does occur if the trial wavefunction is a bosonic 
ground state. Second, the nodal surfaces may be ex- 
actly known from symmetry considerations, as discussed 
by Bajdich et aLQ. A third exception arises from consid- 
ering an effective Hamiltonian for which the trial wave- 
function is an exact solution. This has a potential defined 
by 

K//=i?e// + ^^, (28) 

where E^f / is arbitrary, but is usually chosen to be zero 
for a completely ionized system. If can be shown 
to possess no singularities at the nodal surface, then 
V^V' = at the nodal surface and type 3 tails do not 
occur. An example is the Slater determinant, as this 
is the exact solution for fermions in a one-body poten- 
tial (with no two-body or higher interactions present in 
Veff). (Note that the available modifications of such 'ex- 
act model' solutions, such as Jastrow factors, result in a 
many-body Veff that is singular at the nodal surface.) 
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FIG. 2: Variation of the local energy in the presence of sin- 
gularities of all four types, with an electronic coordinate, ri, 
passing through singular hyper-surfaces. /, II, III, and IV 
denote singularities due to e-n interaction, e-e interaction, 
the nodal surface, and incorrect asymptotic behaviour (shown 
here for the Gaussian case), respectively. Units are arbitrary. 



Removing type 3 singularities is a non-trivial problem 
since it is necessary to ensure that remains finite over 
the nodal surface apart from on the coalescence planes, 
where it must possess a singularity that exactly cancels 
the electron-electron Coulomb interaction. Type 3 tails 
are taken to be unavoidable in practice. 

In order to clarify when these singularities/tails oc- 
cur it is worth considering some examples. For an exact 
wavefunction none of the singularity types occur. For a 
Hartree-Fock or Kohn-Sham Slater determinant with no 
basis set error only type 2 singularities occur, since the 
electron-nucleus cusp conditions are satisfied, the asymp- 
totic wavefunction behaviour has the correct exponential 
form, and the local kinetic energy is finite at the nodal 
surface. For a Hartree-Fock or Kohn-Sham Slater de- 
terminant with a Gaussian basis set, singularities of all 
four types occur, but type 1 and 2 singularities can be 
expected to dominate. 

Figure [5] shows a schematic of the form taken by 
the singularities in the local energy as an electron 
passes through the nucleus, through a coalescence plane, 
through a nodal surface, and continues away from the 
nucleus, for the case where all types of singularity are 
present. From this point on, only the infiuence of type 
3 singularities and the associated symmetric tails in the 
seed distribution are considered, since type 1 and type 
2 behaviour is easily and routinely removed, and type 4 
behaviour does not affect the analysis that follows. It is 
the presence of these 'leptokurtotic' power law tails (also 
known as 'heavy tails', or 'fat tails') in the PDF of the 
sampled energies that provides the starting point for an 
analysis of random errors in the estimates of expectation 
values within VMC. 



7 



IN 

b 




-50 50 



FIG. 3: The seed probability density function estimated by a 
histogram of r = 10^ sampled local energies (black). These 
are results for an accurate all-electron carbon trial wavefunc- 
tion, as described in the text. Shown in grey is the model dis- 
tribution of Eq. (|23|l that reproduces the mean and variance 
of the samples, and the dotted line is the Normal distribution 
that reproduces the same mean and variance. 



Before commencing, it is useful to explicitly show the 
presence and magnitude of the type 3 singularities for 
a real system, the isolated all-electron carbon atom. A 
numerical Multi-Configuration-Hartree-Fock calculation 
was performed to generate a multideterminant wavcfunc- 
tion consisting of 48 Slater determinants (correspond- 
ing to 7 configuration state functions (CSF)) using the 
ATSP2K code of Fischer et al.% Further correlation 
was introduced via a 83 parameter Jastrow factor [To|. 
and a 130 parameter backflow transformation llj. This 
219 parameter trial wavefunction was optimised using 
a standard variance minimisation method[l^, result- 
ing in EvMc — —37.83449(7) a.u., compared with the 
'exact'llJl result of —37.8450 a.u. Of those trial wave- 
functions that can practically be constructed and used 
in QMC this may be considered to be accurate, and re- 
produces 93.3% of the correlation energy at the VMC 
level. 

As discussed above, only type 3 singularities contribute 
to the asymptotic behaviour of the seed distribution. Fig- 
ure [3] shows an estimate of the seed PDF, P^2{E), con- 
structed by taking 10^ standard samples of the local en- 
ergy, binning these into intervals, and normalising 14^. 
Also shown is a simple analytic form 



V2 



E-Et. 



(29) 



and a Normal distribution, both with a mean and vari- 
ance of Etot and tr^ whose values are obtained from the 
data using the usual unbiased sample estimates. 

It is apparent that the seed distribution, P^2 [E) , is 



not well described by a Normal distribution. Considering 
that no fitting procedure is employed (beyond matching 
the first two moments of the model and sample distribu- 
tions) it is somewhat surprising that the simple model 
distribution is so close to the actual distribution. This 
is most clearly demonstrated by comparing the number 
of sample points predicted in a 'tail region' defined by 
[E — Etot) > 10(T — 2.25 a.u. The numerical data has 
2990 sample points in this region, p{E) predicts 3481 
points, and the Normal distribution predicts 1.7 x 10~^* 
points. 

An alternative measure is to assume the asymptote 



Pasym\^ ) — ^ 



E — Etot 



(30) 



to be dominant in the 'tail region', and to equate the sam- 
pled and predicted number of outliers. This estimates the 
magnitude of the leptokurtotic tails to be A3 = 0.86 (in 
comparison with A3 = 1 for the model distribution of 
Eq.ESl). 

Figure [3] suggests that the local energy is not well sam- 
pled close to the nodal surface, where the deviation from 
the mean is greatest. Further suspicion that a more de- 
tailed analysis is required arises when it is noted that 
third or higher moments do not exist for this seed dis- 
tribution, even though a finite number of samples will 
provide an estimate of these higher moments that con- 
verges to infinity as the sample size is increased. 



III. RANDOM ERROR IN VMC ESTIMATES 

In the previous section no mention of MC methods has 
been made. In this section the consequence of choosing 
the 'standard sampling' strategy in QMC is investigated. 

It has been noted by previous authors that for many 
calculations the distribution of the local energy is clearly 
not Gaussian, for both VMC and DMC calculations (15l 
[TgI . [TtI [TH . Section |TT] shows that this is generally the 
case. In previous work it also appears to be implicitly as- 
sumed that the form of the seed distribution is irrelevant 
to the application of the CLT to infer information on the 
random error of estimated quantities [l|. In what follows, 
the influence of the leptokurtotic tails on the validity of 
the CLT is examined in detail, and the distribution of 
random error in VMC estimates is derived. 

Numerical evidence for a valid CLT is at best limited, 
and only weakly suggestive. For most applications of 
QMC only single estimates are constructed, with an esti- 
mated random error calculated using the CLT. Generally 
no ensemble of estimates is calculated to justify that this 
error is Normal. The best we can do is observe that for 
many published results the estimated total energies and 
errors are consistent with exact energies where these are 
known in that they are higher (to within the statistical 
accuracy suggested by the CLT). This still leaves signif- 
icant room for non-Gaussian error, especially for larger 
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systems and estimates of quantities other than the total 
energy. 

Results for wavefunction optimisation within VMC are 
more strongly suggestive. The most stable implementa- 
tion possible for a stochastic minimisation method would 
provide a Normal random error in the optimised func- 
tional. Instability is commonly observed for many of the 
available implementations, particularly for a large num- 
ber of particles or where the nodal surface of the trial 
wavefunction is varied [isl. [l6j . This is consistent with 
the notion that the CLT may not be valid for these im- 
plementations. 

Possible distributions of error in estimates can be sum- 
marised as follows. The catastrophic case would be for 
the Law of Large numbers to be invalid, providing esti- 
mates that do not statistically converge to an expecta- 
tion as r approaches infinity. Another possibility is that 
the Central Limit Theorem may not be valid, providing 
estimates that statistically converge, but with a random 
error that is not Normally distributed. A further possibil- 
ity is that the CLT may be valid, but that the deviation 
from the Normal distribution for finite r is unknown, so 
may be significant for accessible sample sizes. A final, 
ideal case would be for the CLT to be valid, and for the 
deviation from the Normal distribution for finite r to be 
known, and to be unimportant for accessible sample sizes. 

The first and last of these are found not to occur, 
while the other cases do (depending on what is being 
estimated), as a direct consequence of the presence of 
the leptokurtotic tails. 



A. Total energy 

As discussed in section HI the unbiased estimate of the 
total energy constructed from local energy values at r 
points sampled from the P^2 distribution is given by 



this sum, 



1 

Ar [Etot] = - ^ E„, 



(31) 



with {E„} the IID random variables El{R). This 
(rescaled) sum of IID random variables can be analysed 
using the known properties of the PDFs of each E„ to 
obtain the PDF of the estimate itself. 

It is useful to introduce some supplementary random 
variables in order to keep the notation simple. Defining 
the mean and variance of P^2 as E [Ei] and cr^ provides 
the transformation 



1 



{£.„-K[El\), 



(32) 



as long as the first two moments exist. This X„ has a 
PDF, p{x), of mean and variance of and 1, and a sym- 
metric asymptotic behaviour cx Ijx^. Two further ran- 
dom variables are S^, defined as the sum of r independent 
samples taken from p(x), and the normalised version of 



(Xi 



The transformation from to [Efot] is 
A, [Etot] = ^Y, + E [El] , 



(33) 



(34) 



so that Yr is the random error in the estimate of the total 
energy in units of cr/y^. 

The validity of the CLT for these sums of random vari- 
ables is tested below, for the three most common forms 
of the CLT available. These are considered in order of 
increasing generality (in that they are valid for progres- 
sively larger classes of PDFs) and decreasing knowledge 
of finite sampling effects (in that limits on the deviation 
from normality for finite r are progressively less well de- 
fined). 

The least general CLT is provided by the existence 
or not of an Edgeworth series expansion Q. Provided 
that all the moments of p{x) exist, and that they satisfy 
Carleman's condition 5], then the distribution of Y^ for 
r samples, Pr (y), can be uniquely defined by the infinite 
series 



Priy) 



1 



1 



faiy) , feiy) 



(35) 



where each fm{y) is a finite polynomial in y of order 
TO, and with coefficients that may be expressed in terms 
of the first to moments of the seed distribution. If this 
expansion is valid, Pr (y) converges to the Normal distri- 
bution for increasing r, and the expansion also provides a 
definite bound on the deviation of the distribution from 
Normal for finite r - the deviation can be estimated if 
necessary, and scales as the Gaussian function. For the 
seed distribution of local energies, P^2 , the asymptotic 
behaviour ensures that all moments higher than 2"*^ do 
not exist, hence this form of the CLT is invalid. 

A more general result is the Berry-Esseen theoremQ, 
which states that the inequality 



Pr (y) - 



1 



C 



dy < -^T-^ / [y[^p{y)dy, 



(36) 

is valid provided the 3'"'' absolute moment on the RHS is 
finite (C = 0.7655 is the best value of C available [19]). 
This proves that Pr (y) converges to the Normal distri- 
bution for increasing r, and also provides a bound on 
the deviation of the distribution from Normal for finite 
r. The asymptotic behaviour of the seed distribution en- 
sures the nonexistence of the 3'"'^ absolute moment, hence 
this form of the CLT is invalid for P^2 . 

The final candidate is Lindeberg's theoremlSj. This 
is the most general form of the CLT, and provides the 
weakest bound on the deviation from Normality for finite 
r. Provided that 



Max 



\Hy)\ 



i + r 



< oo, 



(37) 
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it follows that 



lim 



Pr {y) <P{y)dy = 



/2ti 



(t>{yy 



dy, 



(38) 

or that in the limit of r approaching infinity the proba- 
bility of the sum of random variables falling in a given 
interval (given by (f>{y) = 1) is equal to that of the Nor- 
mal distribution provided by the CLT, provided that the 
2nd jjioment of p{x) exists. This provides confidence lim- 
its from the sample mean and variance via the CLT for 
large r, but two points must be borne in mind. First, 
for (f>{y) increasing faster than second order (such as the 
definition of moments higher than 2"'^ order) the expec- 
tation is not defined, even in the limit of r approaching 
infinity. Second, for finite r there is no limit to the mag- 
nitude of any deviation from Normal, or to how fast these 
deviations decay with increasing r. 

These theorems inform us that the random error in 
the unbiased estimate of the total energy obeys the CLT, 
but no information is available about the deviation of 
the distribution of errors from Normal for finite r. This 
is unsatisfactory, since only a finite number of samples 
will ever be available. 

Using the asymptotic behaviour derived in section |TT] 
does allow us to extract information about the devia- 
tion from Normal that appears in (jj). In what follows 
this is achieved by using the same strategy as the most 
frequently presented derivation of the CLT[23], but ex- 
plicitly taking into account the leptokurtotic tails. 

Denoting the PDF of the sum as Pr{sr) (distinct 
from Pr (y) but related via a change of variables) and 
viewing this sum as a random walk in one dimension 
leads immediately to the iterative convolutions 



Pri-Sr) ^ p{Xr)^ Pr-l{Sr-l), 



(39) 



starting from Pi(si) — pixi). In Fourier space this is 
simply a product, and defining the Fourier transform as 



p{k) = / p{x)e 



— ikj! 



dx 



immediately gives 



Pr{k) 



_ grlnp(fe) 



(40) 



(41) 



with Pr{k) andp(fc) the characteristic functions of Pr{sr) 
and p{x) respectively. Equation (|4T|) reduces the prob- 
lem to that of finding the inverse Fourier transform of the 
r*'' power of the Fourier transform of the seed distribu- 
tion (with an appropriate transformation of the random 
variables) . 

For a PDF to possess a smooth characteristic func- 
tion (in the sense that all derivatives exist at all points), 
the PDF must decay at least exponentially fast as \x\ 
oo[2l|. If this were the case, then a Taylor expansion 
would exist for \np{k) that is valid for all real k. For the 
distribution of local energies, the PDF falls to zero alge- 
braically slowly which implies the presence of poles in the 



complex plane for finite \x\, discontinuities in the Fourier 
transform at the origin, and no Taylor series expansion 
about fc = for Inp(fc). 

The Fourier transform may be performed by contour 
integration in the complex plane, closing the contour in 
the upper half plane for 5R[fc] < 0, and the lower half plane 
for di[k] > 0. This, in addition to the constraints on the 
residues and the position of the poles that prevent any 
slower asymptotic behaviour, provides a general series 
expansion 



1. 



A, 



Inpik) = - + ^ Ifcl + ^3 + O (fc4) . (42) 

All of the coefficients in this expansion are completely 
unrelated to moments of the seed distribution, and for the 
model distribution shown in Fig. ([3]), A3 = 1 and 773 = 0. 
Higher order discontinuities may also be present in this 



expansion, as generally a 



term in the asymptotic 



behaviour of a function is accompanied by a term 
in its Fourier transform due to the properties of bilateral 
Laplace transforms |21i]. 

This series expansion provides the required expression 
for Pr{k), 

A3 



Pr{k) = exp 



-r-k^ 
2 



+ r-^\k\'+rrj^ {ikf + O {k^) 



3%/2 



Changing variables to w = ^/rk and y = Sj 
performing the inverse Fourier transform gives 



(43) 
and 



Priy) 



1 

2^ 



exp 



A3 1 



w 



+r]3^ iiwf + O 



dw. 



(44) 



where the lowest order terms that are independent of 
r have been factored out. Expanding the exponential 
whose argument is a function of as an asymptotic 
series in r gives 



Pr{y) = My) + ^^"^^3*^2/) + m^My) + ■ 

where 4>o{y) is the standard Normal distribution, 

J -00 



and 



<t>q{y) = 



2ti 



{iwYe''"y-'"^''^dw. 



(45) 



(46) 



(47) 



Higher order terms can be written in the same form, and 
will have a prefactor proportional to r^"'^'/^. Note that 
Xq and (pq are distinct only for odd q. 

Since 0o(j/) is a Gaussian function, the CLT is valid, 
and the PDF may be expressed as 



Pr{y) 



1 
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where D{x) is the Dawson integral 2l| defined by 



D{x)=e-'=' / e*'dx, 
Jo 



(49) 



and possessing finite derivatives of aU orders, and a 
known asymptotic expansion. Further terms can be in- 
cluded explicitly if required, as higher order derivatives 
of the Gaussian function and Dawson integral. 

In a region close to the mean, Eq. (|48p may be ex- 
panded in the form 



lim Pr{y) = 
\y\-*o 



1 



2tt 



-^hi (y) 



r 



(50) 

where hi is an infinite series that converges over a finite 
region surrounding the mean. This expansion differs from 
the Edgeworth series in that it does not converge for all 

y- 

Far from the mean, where the previous series expansion 
does not converge, the asymptotic behaviour takes the 
form 



\y\ 



lim Pr{y) 



yl^l + J_lh2 (- 



TT yTy"^ \fTy^ \y^ 



Of' I 



(51) 

with h2{x) an infinite scries that converges over a finite 
region surrounding a; = 0. This form arises because the 
second sum in Eq. (|48|) dominates for large y (it is ob- 
tained from the asymptotic form of the derivative of the 
Dawson integral), and is fundamentally different in char- 
acter to the Gaussian decay that would occur were an 
Edgeworth series to exists. 

The model seed distribution introduced in the discus- 
sion of the all-electron carbon VMC results of the previ- 
ous section corresponds to the special case A3 = 1 and 
hi = h2 = 0, and is the simplest form that results in this 
'persistent leptokurtotic' behaviour for the distribution 
of total energy estimates. 

These results allow some general observations about 
the distribution of errors in total energy estimates. As 
expected, the Normal distribution emerges in the large 
r limit. However, for finite r the character of the devia- 
tion far from the mean is dominated by E~'^ tails. The 
magnitude of these tails, A3 is not expressible in terms 
of moments of the samples, but is required in order to 
decide whether these leptokurtotic tails are statistically 
significant. 

Figure S] shows the distribution of errors, {Pr of 
Eq. truncated to order 1/r^/^), for a range of A3 val- 
ues and 7^3 = (a non-zero value would introduce some 
asymmetry close to the mean). A non-zero A3 causes a 
redistribution of probability in an inner region where the 
Gaussian contribution to the density is dominant, with 
a net shift of probability to an outer region where the 
Gaussian contribution is vanishingly small and the lep- 
tokurtotic tails dominate. 

A useful indicator of the impact of the leptokurtotic 
tails on confidence limits can be extracted as follows. The 
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FIG. 4: Probability density function for the random error in 
the estimated total energy. Results shown are for a kernel es- 
timate of the PDF resulting from 10'' estimates with r = 10^ 
for each estimate (black). Grey lines show the predicted dis- 
tribution, including leptokurtotic tails, for different A3 values. 
For comparison, the Normal distribution that emerges in the 
large r limit is also shown (dotted line). 



deviation from the mean (in units of standard error) at 
which the leptokurtotic tail starts to dominate can be 
defined as the intersection of the dominant parts of the 
asymptotic and small y expansion of the distribution. 
This provides the equation 



■Z 1 I 



41ny2 



(52) 



which may be solved numerically, and whose solution de- 
pends weakly on r/A| due to the logarithmic term. Spec- 
ifying extreme values of r < 10^ and A3 > 1 results in 
yc < 5.2. The value = 4 is chosen to be representative 
as it defines the 99.994% confidence interval for a Gaus- 
sian distribution. Using this crossover point naturally de- 
fines a 'Gaussian interval' by \y\ < 4, and a 'leptokurtotic 
interval' by |y| > 4. Table U shows the probabilities re- 
sulting from a seed distribution with varying r/A| values, 
where a typical value for the carbon atom calculations of 
sectionlHis (r, A3) = (lO'*, 1.0) or r/Ai = 10"*. 

It is apparent that the presence of the leptokurtotic 
tails could introduce significant errors, since the confi- 
dence intervals obtained by assuming that the error is 
Normal are not accurate if r/X^ is small enough. For the 
all-electron carbon atom considered earlier, the Normal 
interpretation appears to be valid provided a confidence 
of less than 99.98% is required. For larger A3 the tails 
become more significant, with outliers rapidly becoming 
more common - the probability of an estimated total en- 
ergy falling in the outlier region increases by two orders 
of magnitude over the range of values shown in the table. 

A more direct interpretation of the random error in 
the total energy can be obtained by constructing an esti- 
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r/Xl 



Prob (%) 

Aa" |y| <4'' |y| > 4^ 



oo 
10° 



0.0 
0.1 
1.0 
10.0 
33.3 
100.0 



99.994 
99.993 
99.985 
99.910 
99.728 
99.154 



0.006 
0.007 
0.015 
0.091 
0.272 
0.846 



"corresponding to r = 
''Gaussian region 
'^leptokurtotic region 



10^ 



TABLE I: Probabilities of sample total energies in interior 
and exterior regions for a range of values of r/As. A3 values 
in the second column are those corresponding to r = 10'* . The 
range considered is arbitrary, and values that typically arise 
for different systems are unknown. 



mate of the associated PDF from the numerical samples. 



A kernel estimatejl'^ 
biased total energy estimates, each from r 
energy samples using 



E — Ar [Etot] 



10^ local 



(53) 



where m is the number of estimates, h is the width pa- 
rameter chosen heuristically to provide the clearest plot, 
and the Kernel function, Q, is chosen to be a centred 
top-hat function of width 1. 

This (biased) estimate of the PDF is also shown in 
Figure m The numerical data provides 1 sample estimate 
in the |?/| > 4 region, compared with a prediction of ~ 3 
estimates resulting from the value of A3 = 1 estimated in 
section ini A Normal distribution (obtained from sample 
mean and variance and the CLT) predicts 0.6 estimates. 
This supports the validity of the CLT confidence limits 
for these results. 

To conclude, estimates of A3 and of the total energy 
PDF both suggest that the leptokurtotic tails are present, 
but are not statistically significant for total energy esti- 
mates and the all-electron carbon atom considered. How- 
ever, it must be borne in mind that the estimated tail 
magnitude (A3) has unknown bias, and the range of tail 
magnitudes for other systems is completely unknown, ft 
seems reasonable to expect a larger, less symmetric sys- 
tem, or a trial wavefunction constructed from a finite 
basis, to provide stronger singularities and leptokurtotic 
tails than the accurate wavefunction considered here. 
This implies that the degree of validity of a CLT interpre- 
tation of confidence intervals must be justified for each 
individual difficult task given that no unbiased 

estimate of A3 is available. 

Were leptokurtotic tails to be absent, the evaluation of 
sample moments would be enough to demonstrate that 
the CLT interpretation was valid, and sample moments 
would provide finite r corrections to the confidence inter- 
val. This is not the case for finite A3 and some (necessar- 



ily biased) estimate of its value must be obtained from 
the data. 



B. Residual variance 

Following the same approach as for the total energy, 
the estimate of the 'variance' of the local energy is con- 
sidered. Before analysing the statistics of the standard 
unbiased estimate for finite sample size it is useful to 
define this quantity in terms of the underlying physics 
of the system, as opposed to the distribution of random 
samples. Previous publications [isl [l^ [2^ have used dis- 
tinct definitions of the 'variance' interchangeably, and 
inconsistently, especially when considering different opti- 
misation and/or sampling strategies. 

The residual associated with the Schrodinger equation 

for the system of interest and a normalised trial wave- 

^ 1/2 
function, ^jj = ^jj/ [J ^jj'^dlL] , is defined as 



S= H-Eg 



(54) 



The 'residual variance principle' requires the minimisa- 
tion of the integral of 5^ over all space with respect to 
variations in the wavefunction [23j . The parameter Eq 
may be viewed as a further variational parameter, giving 
the 'residual variance' 



F52=E {EL-EtotY 



(55) 



where Etot is the expectation value of the total energy of 
the trial wavefunction as defined in the previous section. 
This residual variance is zero when is an eigenstate of 
the Hamiltonian, and positive otherwise. 

The standard unbiased estimate for this quantity, con- 
structed with 'standard sampling' and r samples in en- 
ergy space, is then given by 



A. [Vs 



1 



1 ^ 

n=l 



(En-Ar [Etot]) 



(56) 



In a similar manner to the total energy estimate it is 
often assumed (whether explicitly or implicitly) that the 
CLT characterises the random error in this estimate. 

The PDF of this estimate of the residual variance is 
of interest in its own right, as for 'standard sampling' 
it provides the confidence interval for the total energy 
estimate (via the valid CLT assumption for the total 
energy). More importantly, the residual variance is of- 
ten the quantity that is minimised when optimising trial 
wavefunctions, hence the statistics of errors in its esti- 
mate may well decide the success or failure of an attempt 
to optimise a candidate wavefunction. 

In order to express the sum of squares of random 
variables in Eq. (j56p as a sum of random variables, 
U„ = — 1 is defined, whose PDF can be expressed 
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in terms of the seed distribution p{x) as 

Pv{u) = 2\u + l\^/^ ^^'^^ " VuTT)+p{x = + 1)] 

(57) 

for u > — 1, and otherwise. Due to the x asymptotic 
behaviour of the seed distribution, this PDF exhibits the 
asymptotic behaviour 



hm pi,(u) ~ 1/m^/^, 



(58) 



and the second moment of Pv{u) is not defined, hence 
none of the CLT theorems are vahd. 

From this it follows that the random error in the es- 
timated residual variance does not approach a Normal 
distribution, confidence intervals are not provided by the 
error function, and the sample variance does not pro- 
vide a measure of the random error. This is the case 
despite the fact that the sample variance will be finite 
for any number of samples, as it will approach infinity as 
the number of samples is increased. However, the strong 
law of large numbers (LLN) is still valid, as Pv{u) does 
possess a finite meanf5'|. 

A general form of the distribution of the random er- 
ror is derived in what follows, providing a limit theo- 
rem that takes the place of the CLT. The existence of 
alternative limit theorems (that result in 'infinitely di- 
visible forms' for the distribution, also known as 'Levy 
skew alpha-stable distributions' or 'Stable distributions') 
that are valid for classes of PDF functions is well known 
in statistics, [1, [23| with the CLT and resulting Normal 
distribution being the most familiar example. 

The notation is simplified by defining two supplemen- 
tary random variables. A sum of r IID random variables 
with distribution py (u) is denoted S,- , and a normalised 
sum is denoted V, such that 



V 



Ui 



r.2/3 



^2/3- 



(59) 



With these definitions the transformation from V to 
Ar [V52] is given by 



52 J 



V 



l a' 



(60) 



Following the same approach as for the total energy, 
the PDF of Sr is given by 



PriSr) ^ Pv{Ur) * Pr-liSr-l), 



(61) 



and the characteristic functions of U and Sr are related 

by 



In order to continue, a series expansion of the logarithm 
of Pv{k) is required. For the total energy estimate the 
analogue of this was obtained by closed contour integra- 
tion in the complex plane, however this is not appropri- 
ate for py{k) due to the presence of fractional powers. 
A different route consists of reintroducing the original 
variable, x, into the Fourier transform, giving 



Pv{k)e 



-ik 



p{x)e 



(63) 



which may be performed as a bilateral Laplace 
transform [2l| to give the general series expansion 



lnp„(fc) = -A; 



o(|fc|5/2 



(1-i sgn[fc])|fc|3/2 + A4fc^ 



(64) 



where no linear term appears as the mean of Pv{u) is 
zero (due to the offset in the definition of U„). Note the 
discontinuity introduced by a sign function, sgn[A;], that 
is equal to +1 for positive fc, — 1 for negative fc, and whose 
definition is irrelevant at fc = 0. 

This provides the required expression for Pr{k), 



Pr{k) 



exp 



-rAg;^ (1 - i sgn[fc]) \k\^^^ + rX^k^ 

(65) 



-O {r\kf^') 



Changing variables to w = r'^^^k and v = s^/r^/'^, and 
performing the inverse Fourier transform results in the 
PDF of the normalised sum V, 



Pr{v) 



■3V^ 



(1 



?nM) 



X exp 



AA 2 



O 



w 



,5/2 



p2/3 



dw. 



.|3/2 



(66) 



The lowest order terms are independent of r due to the 
normalisation chosen for V. Expanding the second expo- 
nential as a power series for large r gives 



Priv) 



A4 



Xo{v) + — 02(w) 



(67) 



(62) where 
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and 



1 r°° 



= ^ / (iu')«exp 



iwt) - A3— ^ (1 - i sgn[w]) Iwl^/^ 



iwv - A3— ^ (1 - i sgn[w]) 



dw 



(68) 



(69) 



r 



and differentiation with respect to v iteratively provides 
terms of higher q from xo a-nd 00- The lowest order term 
in this expansion provides the distribution of the estimate 
in the large r limit, and is a particular case of the class 
of Stable Distributions (20]. 

A transformation of the characteristic function to an 
explicit representation of Xo('^) is not available in the lit- 



J 



erature, and is a non-trivial integral. Although a strictly 
closed form representation is not available, here the in- 
tegral is performed analytically to provide the resulting 
distribution in a concise form employing Bessel functions. 
The derivation is given in the appendix, and provides the 
estimate of the residual variance, [V52], as a random 
variable with a PDF given by 



lim Pr{x) 
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where a; is a supplementary variable integrated over to 
obtain probabilities, cr^ is the variance of the underly- 
ing seed distribution of local energies, and 7 is the scale 
parameter for the distribution defined by 

1/3 



7 = 



6Ai 



(71) 



This distribution of unbiased estimates of the residual 
variance in 'standard sampling' takes the place of the 
Normal distribution that occurs for a valid CLT. 

The parameter A3 is the same as that in the analy- 
sis of the total energy estimate, and is a measure of the 
magnitude of the leptokurtotic tails in the seed distri- 
bution. The 'width' 7 is not related to the variance of 
the distribution itself - the mean and variance of Pr{x) 
are and 00 respectively. Although this width param- 
eter approaches zero for increasing r, it does so as r~^/^ 
(the analogous width parameter for the CLT decreases as 
J,- 1/2 -J r^Yie asymptotic behaviour of Pr{x) is given by 



lim Prix) 



Gtt 27 V a; 



5/2 



(72) 



showing that the leptokurtotic behaviour of the PDF for 
U = — 1 is preserved. This is the dominant part of the 
asymptotic behaviour even for finite r, as it can easily be 
shown that the additional terms decay faster than x^^/^. 

Equation ((70|) is a general result for the statistics of es- 
timates of the residual variance for 'standard sampling' 
in VMC (it is also a general result for a sum of IID ran- 
dom variables whose PDF possesses a one sided x~^/^ 



asymptote). General conclusions may be drawn from 
this distribution. The most important result is that the 
CLT does not apply, but the LLN does. It is apparent 
that although confidence intervals exist for an estimate 
of the residual variance, they are completely unrelated to 
a sample variance, and confidence intervals obtained us- 
ing the error function and sample variance are unrelated 
to the distribution of errors even though they could be 
calculated. 

Since no unbiased estimate exists for A3 (or 7), only 
the biased estimates considered earlier can be used to 
construct confidence intervals. In addition, closer exam- 
ination of the form of the distribution reveals that the 
mean may be outside of the confidence interval, since the 
mode and median do not coincide. Another observation 
is that, with increasing confidence, a lower bound of the 
confidence interval decreases slowly (slower than the CLT 
would predict), but the upper bound rapidly becomes far 
larger than that predicted by the CLT. 

Figure [5] shows the general form of the distribution in 
the limit of large r, together with a kernel estimate of the 
same distribution constructed from 10** residual variance 
estimates, each from r = 10^ local energy samples for the 
all-electron carbon atom considered for total energy esti- 
mates. For comparison, a Normal distribution resulting 
from blindly applying the CLT using the mean and vari- 
ance of the sampled data is also shown. It is clear that 
the distribution of estimated residual variance is far from 
Normal, and it should be remembered that the width of 
the Normal distribution (in units of 27) shown in the fig- 
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{v - a^)/{2^) 

FIG. 5: Probability density function for the random error in 
the estimated residual variance. Results shown are for a ker- 
nel estimate of the PDF resulting from 10* estimates with 
r = 10^ for each estimate (black). Grey lines show the pre- 
dicted large r limit (a Stable PDF). For comparison, a Normal 
distribution with a mean and variance taken to be the sample 
mean and variance of the data is also shown (dotted line). 



finite sample size effects would be readily available. 

These results do not invalidate the current use of 'stan- 
dard sampling' for total energy or variance estimates, 
since these estimates still converge to the expectation 
values for increasing r. The difficulty is that estimates of 
the random error in these quantities are not available, ft 
may be that assuming 'r is large enough' provides practi- 
cal estimates of the error in the total energies estimates, 
but whether this is the case depends on more than the 
sample moments. Errors in the residual variance esti- 
mates are unavoidably not Normal, even in the large r 
limit, and the probability of outliers occurring does not 
fall off exponentially with r, but as a power law. 

Estimated total energies and residual variance were 
chosen for consideration because of the central role played 
by these quantities in QMC methods. In the next section 
the results of a similar analysis of the 'standard sampling' 
estimates for other physical quantities is described, to 
show that the emergence of a non- Normal distribution of 
errors with power law tails is not limited to estimates of 
the residual variance. 



IV. OTHER ESTIMATES 



ure diverges with increasing number of residual variance 
estimates. 

Observing that the limiting distribution describes the 
carbon data well, and that r — IQ^ is a relatively small 
number of sample points, suggests that the large r limit 
has been reached in this case. For less accurate trial 
wavefunctions this may not be the case. Since the devia- 
tion from the large r limit has a magnitude proportional 
to r~^/^ this should be justified on a case by case basis. 

The significance of the deviation from the Normal dis- 
tribution may best be estimated by considering the pre- 
dicted number of estimates in the interval (w — cr^)/27 > 
2, for 10"' estimates. Incorrectly assuming the validity of 
the CLT predicts 0.0 outhers, Eq. ^ (with A3 = 1.0) 
predicts ~ 266 outliers, whereas the numerical data pro- 
vides 198 estimates in this interval. Confidence intervals 
could be defined using Eq. ([70)1 and estimates of the pa- 
rameter A3. This is not carried out here. A variety of 
methods for the estimation of parameters such as A3 do 
exist, but are inherently biased [25|. 

It appears that the most important non-Gaussian fea- 
tures of the distribution of sample residual variance esti- 
mates are that 7 cx r~^/^, and that outliers are likely. 

Results for the estimate of both the total energy and 
the residual variance may be summarised in the state- 
ment that the 'standard sampling' method does not sam- 
ple the tails sufficiently to provide a statistically ac- 
curate measure of their contribution to estimates. Were 
these leptokurtotic tails to be absent, none of the defi- 
ciencies described above would be present - all moments 
of the local energy distribution would exist, leptokurtotic 
tails could not occur, and unbiased estimates that include 



The analysis given in the preceding sections can be 
applied to general estimates in 'standard sampling' VMC 
to obtain the distribution of the accompanying random 
error. Ideally, it would be hoped that accurate confidence 
limits would be available as a result of the CLT being 
valid in its strongest form. 

In this section estimates of the expectation value of 
several operators are considered, and these take the gen- 
eral form 



1 

A, [X] = -^xl(R„), 



(73) 



a mean of a local quantity xl ■ Singularities in xl can be 
classified by location as type 1,2, or 3 in the same manner 
as for the local energy singularities, but the order of the 
singularities is generally different. The distribution of 
the estimates themselves are then obtained via the same 
surface integration and generalised central limit theorem 
approach used for the local energy. 



A. Kinetic energy and Potential energy 

The most straightforward estimate for the electronic 
kinetic energy is provided by the kinetic part of the local 
energy. 



2 tP 



(74) 



This possesses type 1 and 2 singularities if the Kato cusp 
conditions are satisfied, and type 3 singularities unless 
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V^V" = at the nodal surface. These singularities re- 
sult in a Normal distribution of estimates in the large 
r limit, with 'lopsided' tails in the PDF that decay 
with increasing r. However, the presence of type 1 and 
2 singularities is expected to result in larger x~'^ tails in 
the PDF of the kinetic energy estimate than for the total 
energy estimate. 

An alternative estimator for the kinetic energy is pro- 
vided via Green's 1** theorem, and takes the form of the 
sample average of the random variable 



E 



(75) 



-I R„ 



where Vj denotes the gradient with respect to the co- 
ordinate of electron i. Type 1 and 2 singularities are 
not present since the gradient of the wavefunction pos- 
sesses no singularities. Type 3 singularities arise from 
the quadratic behaviour of V'^ about the nodal surface, 
resulting in a positive x~^/'^ tail in the PDF of the sam- 
pled random variable and no CLT. The resulting PDF 
of kinetic energy estimates is the same one sided Stable 
PDF as for the residual variance estimates, with infinite 
variance and a x~^l'^ power law tail. 

Two potential energy estimates follow naturally from 
the two kinetic energy estimates and the total energy es- 
timate. One of these possesses type 1 and 2 singularities, 
and results in a weakly valid CLT with strong x^^ tails. 
The second possesses type 3 singularities only, which re- 
sult in no valid CLT, and the same one sided Stable PDF 
as the residual variance estimate, with infinite variance 
and a a;~^/^ power law tail. 



B. Non-local Pseudopotentials 

For systems described using non-local pseudopoten- 
tials, the local energy estimate takes the form 



VMC via perturbation theory, and the required estimates 
are available in the literature [2^, [2^. These generally 
possess singularities of all three types, and result in x^^l"^ 
tails in the PDF of the sampled local variable. As a direct 
consequence of these tails the CLT is not valid and the 
large sample size limit of the distribution of estimates is 
not Normal, but a two sided variant of the Stable PDF 
found for the residual variance estimate, that is with a 
finite mean, an infinite variance, and two sided x'^^l'^ 
power law tails. 



D. Atomic force estimates 

For estimates of atomic forces the 'local Hellmann- 
Feynman force' is commonly taken to possess the 
form(23 



(77) 



where Vx is the gradient with respect to the nucleus co- 
ordinate(s), X, evaluated at the nucleus positions of in- 
terest, and Y is the sum of one-body potential energy op- 
erators due to each atomic nucleus in the system. (Both 
the operator and the trial wavefunction are functions of 
the nucleus position.) 

For the special case where F is a local potential the 
wavefunction cancels, and the gradient operator acts on 
the multiplicative potential only. For smooth local po- 
tentials no singularities arise, and the CLT is valid for the 
resulting estimate. For a Coulomb potential type 1 singu- 
larities arise, and result in estimates whose distribution 
in the large sample size limit is a two sided Stable law 
of finite mean, infinite variance, and with x^^l"^ power 
law tails. For smooth non-local pseudopotentials type 3 
singularities arise, and result in estimates whose distri- 
bution in the large sample size limit is, again, a two sided 
Stable law with x~^l'^ power law tails. 



a;t(Rn) = 



(76) 



E. Linearised basis optimisation 



where V is the sum of one-body non-local operators that 
make up the pseudopotential. Provided the pseudopo- 
tential is not singular these do not possess type 1 singu- 
larities, and type 2 singularities may be prevented using 
the usual Kato cusp conditions. However, strong type 3 
singularities can be expected at the nodal surface, result- 
ing in tails in the sample PDF. Hence, for non-local 
pseudopotentials, the CLT is expected to be weakly valid, 
with slowly decaying x^** tails that are larger than for the 
local potential case. 



C. Mass polarisation and relativistic terms 

Corrections to the total energy due to finite nucleus 
mass and some relativistic effects may be implemented in 



A wavefunction optimisation strategy has recently 
been developed [28l [29| that linearises the influence of 
variational parameters on the total energy by construct- 
ing a basis set from derivatives of the trial wavefunction 
with respect to parameters of the wavefunction, a^. Ap- 
plying the total energy variational principle results in 
a matrix diagonalisation problem, with matrix elements 
defined by integrals that are estimated as means of the 
sample values 



(78) 



J R„ 



with ipi the derivative of the trial wavefunction with re- 
spect to parameters a^, except for ipo = ijj. A is either 
the identity or the Hamiltonian operator. 
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Generally, the linear behaviour of the wavefunction as 
the nodal surface is crossed introduces singularities in 
the sampled quantity, resulting in x~^/'^ tails in the PDF. 
These result in an invalid CLT, and the estimated matrix 
elements have a PDF (in the large sample size limit) of 
the same form as for the estimate of the residual variance 
- the one sided Stable distribution with infinite variance. 
Some exceptions occur for particular matrix elements; for 
the Hamiltonian operator the distribution of the estimate 
is weakly Normal for z = 0, and for the identity operator 
the CLT is weakly valid for i or j 0, and the 
variance is zero for i = j = Q. 

Although this informs us of the distribution of each es- 
timated matrix element, it provides no direct information 
on the correlation between elements, or of the distribu- 
tion of the lowest eigenvalue of the estimated matrix [30,]. 
However, it seems likely that the invalidity of the CLT 
makes a significant contribution to the instabilities that 
must be carefully controlled for an implementation of this 
optimisation method to be successful. 



V. CONCLUSION 

The sampling distribution for a local quantity can be 
simplified by reducing the 37V-dimensional distribution 
to the degrees of freedom of the local quantity that is 
sampled, with derivable asymptotic behaviour. Such an 
analysis has been applied here to characterise the random 
error for the two most important estimated quantities 
in variational QMC, the total energy and the residual 
variance. 

For estimates of the total energy within the 'standard 
sampling' implementation of VMC, the CLT is found to 
be valid in its weakest form with the consequence that 
the influence of finite sample size is not obvious and must 
be considered on a case by case basis. Outliers have been 
found to be significantly more likely than suggested by 
CLT confidence limits. No rigorous bounds exist that 
provide limits to the deviation from the CLT for finite r, 
and consequently confidence intervals based on the CLT 
may be misleading. However, for the example case of an 
all-electron isolated carbon atom and an accurate trial 
wavefunction the assumption of large sample size appears 
to be useful. 

The variance of the local energy has also been consid- 
ered in light of the primary role played by this and sim- 
ilar quantities in wavefunction optimisation procedures. 
A statistical variance of the local energy within 'standard 
sampling' is equivalent to the residual variance defined in 
terms of the Hamiltonian and trial wavefunction them- 
selves, and the statistics of the estimate of this quantity 
have been investigated. 

For estimates of the variance within the 'standard sam- 
pling' implementation the CLT is found to be invalid. A 
more general Stable distribution and generalised central 
limit theorem take the place of the Normal distribution 
and CLT, and this Stable distribution is fundamentally 



diff'erent from the Normal distribution. It possesses tails 
that decay algebraically, and so outliers are many orders 
of magnitude more likely than suggested by the CLT. 
The width scale of this distribution falls as r~^^^ , signif- 
icantly slower than the r^^/"^ scaling that would result 
from a valid CLT. The distribution is asymmetric, so the 
mean and mode do not coincide. Only biased estimates of 
the parameters of this distribution (other than its mean) 
are available, and confidence intervals based on the CLT 
are entirely invalid. 

In order to demonstrate that this is not a statistical 
issue particular to estimating the residual variance, esti- 
mates of the expectation values of several other operators 
have also been considered. For most of these the CLT is 
found to be invalid, with the same or a similar distribu- 
tion of random error arising as for the residual sampling 
estimate - the Stable distribution with x^^/^ asymptotic 
tails and infinite variance. 

Perhaps the most important consequence of these re- 
sults arises in the context of the minimisation of the resid- 
ual variance and related quantities carried out to opti- 
mise a trial wavefunction. Many of the instabilities en- 
countered in different optimisation methods [isl. [l6t may 
be due to the use of estimates that are statistically faulty. 

By shedding an assumption about the properties of 
QMC estimates and replacing this with a derivation of 
the true distribution of random errors, it has been shown 
that deviations from the CLT are not trivial and can 
be expected to have a significant influence on the accu- 
racy and reliability of estimated physical quantities and 
optimisation strategies within QMC. The analysis itself 
provides a new explicit (but not rigorously closed) ex- 
pression for a particular Stable law PDF, and a general 
approach to assessing the strengths and failures of gen- 
eral sampling strategy/trial wavefunction combinations 
for estimating expectation values of physical quantities 
in QMC. 
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APPENDIX 



Defining a^/^ = gives xo of Eq. ^ as 



-a3/2 (1 - i sgn[w]) 



.|3/2 



(A.l) 

Partitioning the integral into the negative and positive 
ranges gives 



Xo(v) =/l(v)+/2(«), 



(A.2) 
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with Ii and I2 integrals taken over < w < 00 and 
—00 < w < 0, respectively. Substituting w — results 
in 



h{v) 



1 

2^ 



2j/exp 



dy, 
(A.3) 



and, for I2, substituting w = — j/^ results in 

poo 







h{v) = — / 2yexp -m/ - a^''^{l i)y^ dy 



These two identities provide 



(A.4) 



Xoiv) = Ii{v)+h{vy 

1, 



-3? 



2yexp 



ivy'^ - 0^/2(1 - i)y^ 



dy . 
(A.5) 
I 



The next step is to obtain the real part of the integral 
in this expression. This can be achieved by converting 
this integral into an ODE for Xo{'^)^ ^^^d then seeking 
the solutions that are real and normalised. 

First define Gn by 



G„(«) = / 22/"exp 
Jo 



so that 



ivy'^ - a^''^{l-i)y^ dy, (A.6) 



Xo(«) = -5R[Gi(«)] 

TT 



(A.7) 



Equations that relate G„ for different indices may be 
derived. The first of these is obtained by integrating the 
derivative of the exponential function in the integrand to 
give 



(2w?/- 3a^/2(l - 



exp 



ivy^ - a^/^(l - i)y^ 
I 



dy = exp 



ivy^ - a^/2(l - i)y^ 



v=0 



(A.8) 



In addition integrating Gn by parts provides the relation and 



(n + 1)G„ = -2z^;G„+2 + ?,a^'^{l - i)G„+3. 
These two expressions provide the equations 



- 1 
Gi 
G2 



ivGi - -a3/2(l-i)G2 

-ivG^-^a^'^{l-i)Gi 
-'^ivGi + a^/'\l-i)G^, 



(A.9) 

(A.IO) 
(A.ll) 
(A.12) 



where the first arises from evaluating the limits in 
Eq. (|A.8|1 explicitly and expressing the LHS in terms of 
Gi and G2 and the following two arise from Eq. (|A.9p for 
n = 1,2. 

Combining these equations to remove G2 and G4, and 
noting that = iG:>,{v) and = —G^{v) provides 



ga^G'/ - 2v^G\ - bvGi = -3i. 
Making the substitutions 

Gx{v)=v'^e^^y g{v) 



(A.13) 



(A.14) 



f- 

V3a 



(A.15) 

further simplifies this ODE, and results in the inhomo- 
geneous ODE 



x^g" + 2xg' 



1 



27a3 



(A.16) 



Only the real solutions of this equations are required, 
hence only the homogeneous ODE 



x'g" 



2xg' 



^-9 )ff=0 



(A.17) 



need be considered. The required solution is finite for 
X ±00 and continuous at a; = 0, and is a sum of two 
modified Bessel functions of the second kind, 

g{x) = A [-sgn(x)i^i/3 {\x\) + Ky, {\x\)] , (A.18) 

with A an undefined constant. 

Requiring Eq. (|A.10|) to be true for v — provides A, 
and transforming back to v provides the final result 



Xo(w) 



TT (3a)3 



sgn(w)i^i/3 ^ 


V 




V 




3a 


+ ^2/3 


3a 





(A.19) 
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The transformation between v and a more general vari- 
able is described in the main text. 

This provides an explicit form for the PDF of the Sta- 
ble distribution S (3/2, —1, 7, S; 1) (using the notation of 



Nolan[24'|) - Eq. ((XT9)) is for (7, S) = {a, 0) and the gen- 
eral form is trivially related to this by rescaling and trans- 
lation. 
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